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Abstract 

We study an analog of the well-known Gel’fand Pinsker Channel which uses quantum states for 
the transmission of the data. We consider the case where both the sender’s inputs to the channel and 
the channel states are to be taken from a finite set (cq-channel with state information at the sender). 
We distinguish between causal and non-causal channel state information at the sender. The receiver 
remains ignorant, throughout. We give a single-letter description of the capacity in the first case. In 
the second case we present two different regularized expressions for the capacity. It is an astonishing 
and unexpected result of our work that a simple change from causal to non-causal channel state 
information at the encoder causes the complexity of a numerical computation of the capacity formula 
to change from trivial to seemingly difficult. Still, even the non-single letter formula allows one to 
draw nontrivial conclusions, for example regarding continuity of the capacity with respect to changes 
in the system parameters. 

The direct parts of both coding theorems are based on a special class of POVMs which are derived 
from orthogonal projections onto certain representations of the symmetric groups. This approach 
supports a reasoning that is inspired by the classical method of types. In combination with the 
non-commutative union bound these POVMs yield an elegant method of proof for the direct part of 
the coding theorem in the first case. 


I Introduction 

We investigate an information transmission problem where a sender (Alice) wants to reliably transmit 
messages to a receiver (Bob) under the influence of a noisy environment. The problem statement itself 
is rather generic in information theory, and has been addressed in many publications so far. The specific 
situation that we investigate here is one where the sender has advanced knowledge as compared to the 
receiver. This model was first introduced in the case of causal state knowledge by Shannon [JJ] who also 
derived a single-letter capacity formula and then extended to the case of non-causal state information 
by Gel’fand and Pinsker in |28) . 

Later, Costa m developed the widely known method “writing on dirty paper” which makes the ideas 
of Gel’fand and Pinsker also practically useful. Another practically important technique which is based 
on the work of Gel’fand and Pinsker is |53j . Their model has also been extended to quantum systems 
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and a coding theorem for entanglement assisted message transmission has been proven in [25] . 

We concentrate here on a version of coding with (partial) state knowledge where the channel output is 
a quantum system, while the input system is a classical system. We restrict to classical input variables 
such that the optimization gets restricted to the right choice of code words at the encoder plus a positive 
operator valued measurement (POVM) for the decoding at Bob’s site. There are many equivalent ways 
to write down the model but we will confine ourselves here to a version where the channel VFsxX-j-a: 
has input alphabets S, X and the output quantum system is modelled on the finite dimensional Hilbert 
space K. Throughout we assume that |X|, |S| < oo and that the inputs s € S (the channel states) are 
selected at random according to some distribution p G fP(S). Both sender and receiver get to know 
p. While we generally assume that the outcomes of the random process are revealed to the encoder 
prior to the start of message transmission, we consider two different scenarios here: One where this 
knowledge is non-causal in the sense that, over n G N transmissions over the same memory less channel 
and under i.i.d. selection for the channel states s the sender can make his encoding dependent on the 
whole sequence s” = (si,..., s„) and a second situation where to any given message m the components 
xi{m ),..., Xi{m) of the corresponding code word a:"(m) can only depend on the si,..., but not on 
Si+i,..., s„. Throughout, the receiver has no direct knowledge about the realization s", although it may 
generally be possible for him to obtain such knowledge by suitable measurements. We will however not 
study such tasks in this work but rather stay focused on the task of message transmission. 

As indicated already we assume the channel itself to be memoryless and the choice of state sequences is 
i.i.d. according to p. 

We provide a single-letter coding theorem for the case where state information is available only 
causally and a multi-letter coding theorem for the case where state information is non-causal. We note 
that this is a somewhat unsatisfactory situation - originally, the main success of information theory was 
to reduce a seemingly intractable and highly complex problem (finding the supremum over all achievable 
message transmission rates for a given memoryless channel) to a simple convex optimization problem. 
Since then, the capacity of an information transmission system could be calculated easily and it was 
possible to use the capacity as a benchmark for coding strategies. 

While working on this problem, we noted that the last decade has seen numerous examples of information 
transmission systems which do at present not admit a single-letter description. Rather, the currently 
available capacity formulae often require to calculate the limit of a sequence of numbers which are each 
the result of a convex optimization problem: 

(7(01,... ,a7v) = lim max (&i,..., fevr, oi,..., Od), (1) 

n->oo {bi,...,bN)ePLT 

where oi,... , 0 ^ are parameters describing the information carrier and PLT C H is a problem specific 
(convex) subset of some N — dimensional vector space V. Examples came especially from the area of 
quantum information and can be broadly separated into two bins: One where the capacity of the cq 
channel (corresponding to the model which is treated here when |S| = 1) is treated with and without 
additional constraints like e.g. secrecy and one where the entanglement transmission or generation ca¬ 
pacity of quantum channels is investigated. Of course there are many more things one can do with a 
quantum channel but the last two areas show some interesting features: They are sufficiently close to the 
model treated here by us, they are related to one another through the work [22] and they illustrate the 
difficulties in Ending single-letter capacity formulae. 

While first steps in classical information theory were enormously successful (like for example Shannon’s 
pioneering work |48j ). already the search for a single-letter capacity formula for the zero-error capacity 
led to severe problems (Shannon was only able to obtain single-letter lower bounds on the capacity in 
that case. He conjectured that the zero-error capacity is additive [49] in 1956, a conjecture that was 
disproven by Alon 42 years later in 1998 |5]). Apart from such remarkable stories, classical information 
theory does by now contain an abundance of partial results on seemingly trivial problems, for example: 
Ahlswede’s work |2] gives a non-single letter formula for the capacity region of the interference channel, 
one of the core problems of classical information theory. The celebrated works [29] of Han and Kobayashi 
and [39] of Marton provide single-letter lower bounds on the capacity region. Even when it comes to 
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simpler problems involving only three parties we encounter this type of problem, for example for the 
wiretap channel with feedback [^, with side information |16j or the compound wiretap channel [38) . 
For the arbitrarily varying channel under the maximal error criterion with feedback, a non-single letter 
capacity formula could be given by Ahlswede and Cai in [4]. For the arbitrarily varying wiretap channel, 
only a single-letter lower bound m could be given. In many cases, the single-letter lower bounds can be 
alternatively represented as a regularized capacity formula. 

We now concentrate on results in quantum information theory again: The capacity of the cq channel has 
been determined in [34] and in m- Prior to that it had been an open problem for more than 20 years 
after the work [33) . At that time it was even unclear whether it was additive or not. The entanglement 
transmission capacity of quantum channels has been determined in [T0ll|22l|50l|47]. It was proven later 
gniisi] that it is not additive and even shows super-activation. Recent results in classical information 
theory [niiMiiig show that such effects may even occur for classical systems with an eavesdropper or, 
more generally, when the number of available resources which may or may not be used jointly and which 
may or may not be available to some of the parties becomes large enough. 

In the comparison of our coding theorem with other results in quantum information we became aware of 
the fact that the (strong) secrecy capacity of a system with fixed signal states and two quantum receivers, 
one being the legal receiver and the other an illegitimate eavesdropper, is given by a multi-letter formula 
as well mun]. Moreover, the secrecy capacity of certain classical-quantum wiretap channels and the 
entanglement generation capacity of a quantum channel are related via the work [22) of Devetak, who was 
able to derive a way of turning a quantum channel into a cq-channel with one legal and one illegitimate 
receiver. He then showed how to transform private codes for the cq channel into entanglement generation 
codes for the quantum channel. 

We also noted that the problems [51111111111] have one thing in common: They are generalizations of 
information-theoretic problems where the known proofs of the converse parts use Csiszar’s sum identity. 
Despite the lack of efficiency and elegance of a regularized expression of a capacity the recent work m 
was the first to demonstrate that nontrivial insights may be gained even from a regularized expression: In 
[14) it was proven that the message transmission capacity of an arbitrarily varying channel with quantum 
input for the sender and quantum output system at the receivers side is not continuous in general, but is 
always continuous if assisted by a small (private) amount of shared randomness between sender and re¬ 
ceiver. In addition to that, m gives exact conditions under which discontinuities arise and characterizes 
them in terms of functions which are continuous themselves, although they are not given in a single-letter 
form. 

Coming back to classical systems we note that the capacity of the Gel’fand Pinsker channel (our model 
with non-causal information given to the sender and a channel satisfying \Ws{x),Ws'ix')] = 0 for all 
s,s' € S and a;,a:' G X where [•,•] denotes the commutator of the respective quantum states) has been 
given a single-letter form in the pioneering work [28) but that a trivial capacity formula could be derived 
by taking the respective formula for the case with causal information at the sender and then regularizing 
it. 

We follow this route in our work at least partially when it comes to proving the converse, although we 
are also able to give a different characterization which pays more attention to the specific structure of the 
problem as well. The direct part of our coding theorem for the case of non-causal information is based 
on an approach that was developed in [41] . This approach is slightly closer to what is known classically 
as a “method of types” than previously used approaches in quantum information were. Such approaches 
include for example [3511311^ [IHj. 

II Notation 

All Hilbert spaces are assumed to have finite dimensions and are over the field C. The set of linear 
operators from /C to /C is denoted B{IC). The adjoint of & G B{JC) is marked by a star and written b*. 
S{IC) is the set of states, i.e. positive semi-definite operators with trace (the trace function on B{IC) is 
written as tr) I acting on the Hilbert space fC. Pure states are given by projections onto one-dimensional 
subspaces. A vector a; G /C of length one spanning such a subspace will therefore be referred to as a 
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state vector, the corresponding state is typically written as |a;)(a;| and will, due to lengthy formulas, be 
abbreviated as ijjx in this document. A classical-quantum channel (cq-channel) with input alphabet X 
and output system S{K.) is a map that assigns to each element a; € X a corresponding quantum state 
Px G 5(/C). The set of all such maps is abbreviated Cq(K,lC). 

For a finite set X the notation *P(X) is reserved for the set of probability distributions on X, and |X| 
denotes its cardinality. The set '^5(X) can be embedded into a Hilbert space of dimension |X| by choosing 
any set {ipx}x£_'x. of pairwise orthogonal rank-one states and mapping each p G *P(X) to J2x£xPi^)'‘l^^- 
We will use this kind of embedding in the converse parts of our proofs in order to deliver a consistent 
connection to standard estimates in quantum information theory. Given two alphabets X and Y we will 
sometimes denote elements of *P(X x Y) by e.g. pxY, and in that case it is understood that px G *P(X) 
and pY G *P(Y) denote the respective marginal distributions of pxY- 

The set of channels (stochastic matrices) from an alphabet X to another alphabet Y is written Ch{lL, Y). 
Its elements V map any given symbol a; G X to the symbol y G Y with probability v{ii\x) G [0,1]. 

For any n G N, we define X” := {(xi,..., Xn) : Xi G X Vi G {1,..., n}}, we also write x^ for the elements 
of X”. Given such element, iV(-|a;") denotes its type, and is defined through iV(a;|a;") := |{i : Xi = a;}|. 
Normalized types are dehned as iV(a;|a;") := ^N{x\x^) for all x" G X" and x € X. For any natural 
number n G N, the notion of type defines a subset iPg (X) C *P(X) via iPg (X) := {iV(-|x") : x" G X"}. 
For any natural number L, we define [L] to be the shortcut for the set {1,..., A}. 

The von Neumann entropy of a state p G S{K) is given by 

-S'(p) :=-tr(plogp), (2) 

where log(-) denotes the base two logarithm which is used throughout this work. 

Given a cq-channel W G Cq{X,JC) and a probability distribution p G ‘P(X), the Holevo quantity of p 
and W is defined as 


x{p,W) := S{'^p(x)px) - '^p{x)S{px). 


(3) 


Given two states p,<t £ the relative entropy of them is defined as 


D{p\\a) 


tr{p(log(p) - log(cr)}, if supp(p) C supp(cr), 
oo, else 


(4) 


Another way of measuring distance between quantum states is obviously given by using the one-norm, 
which obeys: 


lip —(tII := 2 max tr{P(p —ct)} (5) 

0<P<1 

We now fix our notation for representation theoretic objects and state some basic facts. 

The symbols A, p will be used to denote Young frames. The set of Young frames with at most d G N rows 
and n G N boxes is denoted 

For any given n, the representation of Sn we will consider is the standard representation on that 

acts by permuting tensor factors. Throughout, the dimension d of our basic quantum system will remain 
fixed. 

The unique complex vector space carrying the irreducible representation of Sn corresponding to a Young 
Tableau A will be written Fx- 

The multiplicity of an irreducible subspace of our representation corresponding to a Young frame A is 
denoted rnA.n, and this quantity can be upper bounded by mA,n < (2n)^ (see [TS]1. 

For A G Yd,n, A G *P([d]) is defined by A(i) := Xt/n. If p G S{C‘^) has spectrum s G fP([d]) (in case 
that p has degenerate eigenvalues we count them multiple times!), then it will always be assumed that 
s(l) > • ■. > s{d) holds and the distance between a spectrum s and a Young frame A G „ is measured 
by ||A — s|| := J2i=i 1^(0 ~ ^(Ol- The distance between two probability distributions p, g G *P([d]) will be 
measured by ||p- g|| := Y.i b(0 “ 9(*)l- 
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A positive operator-valued measurement (POVM) D on a Hilbert space /C is given by a collection D = 
{Dm}m=i C B{IC) of non-negative operators that sum up to the identity: J2m=i 
The Kostka numbers AT/,a are as defined in e.g. Fulton’s book [26], pages 25-26. 

We now define two important entropic quantities. Given a finite set X and two probability distributions 
r, s € tp(X), we define the relative entropy T)(r||s) by 


D{r\\s) 


T.xeyir{x)\og{r{x)/s{x)), if s » r 
oo, else 


( 6 ) 


In case that II(r||s) = oo, for a positive number a > 0, we use the convention = 0. The relative 

entropy is connected to || • || by Pinsker’s inequality I?(r||s) > Q:||r — s||^, where a := l/21n(2). 

The entropy of r € fP(X) is defined by the formula 


H{r) := - X] ’'(®)log(r(a;)). 

a:GX 


(7) 


During proofs, we will be having one fixed state a G S{C‘^) (this will be the average output state) having 
a (non-unique) decomposition cr = and the pinching of an arbitrary state p G 5(0*^) (one 

of the channel output states px) to the orthonormal basis will be given by 

and induces the probability distribution fp G fP([d]) through fp(i) := {ei,pei). It is important for the 

understanding of this paper to keep in mind that the equality D{p\\a) = —H{spec{p)) — J2i=i 

holds. 

We also need the notion of a convex hull. This is e.g. defined in [52]. For a subset B C M" or B C C" 
we denote its convex hull by conv(H). 


Ill Definitions and preliminary results 

The direct part of our work is based on the preceding results [41]. We give a short review of the basic 
ideas utilized there. Let n G N be fixed for the moment. The most important technical definition for this 
work is that of frequency-typical subspaces V/ of (C^)®". These arise from choosing a fixed orthonormal 
basis of C‘^, choosing a frequency / (a function / : [d] —>■ N satisfying /(*) = setting 

Tf := : \{ik ■ h = j}\ = fU) '^3 S [d]}, and defining 

Vf := span({eq 0 ... ® ei„ : (H,..., i„) G Tf}). (8) 

They have been widely used in quantum information theory, but share one very nice property that does 
not seem to have been exploited yet: They are invariant under permutations. From this property it 
immediately follows that 

Vf=^Vpx, (9) 

A 

where each Vf,\ is just a direct sum of irreducible representations corresponding to A that is contained 
entirely within Vf. 

A fundamental representation theoretic quantity which is intimately connected to them are the Kostka 
numbers. In fact, it holds Kf\ = 0 Vf^\ = {0}, both by definition of the Kostka numbers and by 
application of Young symmetrizers as described in [51], pages 254-258. 

Also, we are going to employ the following estimate taken from [JOJ Lemma 2.3] (see equation (flOll H. 
which is valid for all frequencies / : [d] —>■ N that satisfy /(*) = 

-—< |T/| < 2”^(7) (10) 

(n + l)^ - I Ji - \ J 

We will also need Lemma 2.7 from pO] : 
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Lemma 1. If, for A a finite alphabet and p,q G ^(A) we have |p — 9 ] < 0 < 1/2, then 

|if(p)-if(g)|<-eiog^. (11) 

Another very important estimate is the following one (a derivation can e.g. be found in |40j 1: 

2 n(ff(A)-Milog( 2 «)) < (A G Yd,n)- (12) 

Let A be any finite set. For every S > 0, p G ^(A) and n G N, we set T”^ := {a” G A” : ||p —7V(-|a"')|| < 
i5}. It is a well-known fact (see e.g. [50] or, if more notational compliance is desired, [35] or |S3]) that 
this definition implies that for all large enough n G N we will have 

p®n(yn^) > ^ _ 2-n5/2, (I 3 ) 


IV Operational Definitions 

We will in the following deal with classical-quantum channels that are dependent on an additional pa¬ 
rameter s, called the ’channel state’ or simply the state. Such channels will be denoted VFsxx^-a:- Here 
X denotes the alphabet which is used by the sender to encode his messages into the quantum system, 
and S denotes the possible channel states. Both sets are finite. The channel states are assumed to be 
selected according to some distribution p, and the selection of channel states over n uses of the channel 
is assumed to be independent and identically distributed. As the channel is assumed to be memoryless 
as well, the whole system can be described by the pair ( 1 FsxX->XjP)i and we will use this notation 
henceforth. During the treatment of the problem it turns out to be useful to define additional channels 
which are derived from the original model by adding a randomized encoding E G Ch(U, X) which leads 
to a new cq-channel Huxs^x defined by the states 

Ps,u := ^ e{x\u)ps,x- (14) 

Definition 1 (Non-causal code). A code /C„ (for n channel uses) consists of a natural number Mn, a 
stochastic map E G Ch{[Mn\ x S",X”) together with a decoding POVM D on /C®”. The average error 
of the code is 

. M„ 

err(/C„) := 1 - — ^ p^^{s^) ^ e(x”|m, s")tr{p,.,,.D^}. (15) 

”m=ls"GS" x"GX’‘ 

Definition 2 (Causal code). A code K-n (for n channel uses) consists of a natural number Mn and a 
stochastic map E G Ch([M„] x S"',X”) that satisfies for every t G [n] the additional constraint that its 
marginal distributions et(-|m,s”) G *P(X*) which are defined by et(a;*|m, s”) := x„) '5") 

do only depend on s*: There exist Et G C([M„] x S‘,X") such that for every x* G X* we have 

et(a;*|m, s") = et{x*\m, s*). (16) 

A causal code further contains a decoding POVM D = {Dm}me[M„] on /C®”. The average error of the 
code is 


err(/C„) := 1 - 


Mr 


Mn 

■E 

m—l 


p®”(s”) e(x"-|m, s”)tr{p„n_ 2 ,ni:)m}. 




(17) 


We now define what achievable rates are: 
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Definition 3 (Achievable rates). A number R > 0 is called a non-causally achievable rate for 
(bbsxX-^A:;P) if there exists a sequence {K.n)n&i of non-causal codes such that 

lim err(/C„) = 1, liminf — log(M„) > R. (18) 

n—^oo n—^oc Tl 

The number R>0 is called causally achievable if there is a sequence (/Cn)nGN of causal codes such that 

lim err(/C„) = 1, liminf — log(M„) > R. (19) 

n—^oo n—^oo Ji 

Naturally, this leads to the following two definitions of capacity: 

Definition 4 (Capacities). The non-causal capacity o/(hbsxX-s-XiP) is 

^(VFsxx^aCjP) := sup{i? : i? is a non — causally achievable rate for (VbsxX-s-X:P)} . (20) 

The causal capacity of is 

C'c(bCsxx^AC,p) := sup{i? : i? is a causally achievable rate for (M^sxx^XiP)} • (21) 


V Main Results 


Our main results are the following two coding theorems: 

Theorem 1. Let VCsxX-s-k: o, classical-quantum channel. Let p G *)3(S). It holds 

Cc{Wsxx^K,p) = max max y( 9 , W^sxx^x o F) (22) 

gG^P^U) VGC?ip(U.SxX) 


where Chp{V, S x X) := {C e Ch{V, S x X) : 3 e Ch{S x U, X) : u(s, x\u) = u(a;|s, u)p{s) V(s, u, x) £ 
S X U X X} and the size of the alphabet U may be bounded by |U| < |X| • |S|. 

Remark 1. For each fixed p £ *P(S) and finite U, the set Chp{\J,S x X) is convex. In fact, the 
optimization is running on Ch(S x U, X) (which is of course convex as well) and above way of stating 
the coding theorem for Cc is just one of the shorter ways to write down the capacity formula, which would 
otherwise involve concatenated channels V ® Id going from (U x S) x S to S x iX. and being fed with a 
distribution qu ® p^'^\ where p^'^'i £ fP(S x S) is defined via setting p^‘^\s, s') := p{s)5{s,s'). 

The convexity of the set over which the maximum is taken for a fixed q £ ^(U) together with convexity 
of Holevo information in the state set lets us conclude that for fixed p, the solutions to the optimization 
problem are to be found on the boundary of ChpflJ,S x X). This boundary consists of channels V for 
which v{s,x\u) £ {0,1} for all (s,m,x) £ S x U x X. Such channels are in a one to one correspondence 
to functions (^ : S X U —^ X and therefore solutions to the optimizing problem take the form u(s,x|it) = 
p(s)S((p(s, u),x) where tp : S x \J ^ X is a function. 

We now come to the characterization of the non-causal capacity. Here, we are able to give three 
different characterizations, and unfortunately none of them is a single letter formula. 

Theorem 2. Let HsxX-s-k: o, classical-quantum channel. Let p £ fP(S). 

C{Wsx^^K,p) = lim (23) 

n^OD 77, r/v. 

In addition to that we have, for every n £ N, every finite alphabet U„ and setting A„ := S 

tp(S",U,X") : qsn = that 


C{Wsx'X.^K,p) > 


1 

— max 
n gs"u„x" e^Ti 


(x(pu„,bbu„^x«0-^(C^n;5'‘)), 


(24) 
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where P((5'", X"-) = (s^,u,x^)) = (Zs"U„X"(s”,M,a:”), and to every ( 7 s"U„X" (s", u, cc”) G An we 

define a corresponding Wu„->x®" setting, for every u € U„, 

E E Psnxn(s^x"|^.)^^|>,\^^(s^x-). (25) 

s"eS" a:"GX" 

T/ie size of the alphabet U„ in above optimization problem can, for every n G N, be bounded by |U„| < 
(|S| • 2 • |X|)". In particular, the lower bound provides a single-letter lower bound to the non-causal 
capacity when n is set to equal one. 

Inequality ^2^ together with a converse result implies that 

C(VPsxx^x,p) = lim - max (x(Pu„, V1 /u„^k:®") “S'")). (26) 

n^oo n 9snu„X"6el„ 

Remark 2. As in the classical Gel’fand-Pinsker theorem f28l Theorem 1 and Proposition 1] the 
functions going from An to R defined by '&n(<ZS",u,X") := x(9u„, Wu-j.^:®") ~ I{S"^]Un) have 
a convexity property: it is always possible to write a gs^.Un.X" G An as ^s^.Un.X"(s", m, x") = 
<?u„|S"(w|s")p®"(s")qx"|S”U„(^^"I'S";''^)) and from convexity of the Holevo quantity in the channel (or 
the states of the ensemble, respectively) it then follows that each $„ is convex in gx^ls^Un^ if the other 
quantities remain fixed. Since the maximum of a convex function over a closed convex set is always 
achieved at the boundary it follows that the maximum in equation (EH) is always achieved for an extremal 
iT^o,p '?X"|S"U„; which can be written as (?x"|S"U„(a;"|s”, u) = 5{ip{s'^,u),x^) for some appropriately cho¬ 
sen function : S" x U„ —> X” - randomization at the encoder is not necessary. 

We have been unable so far to prove that concavity of in 9u„|S" holds. While this seems to be of 
minor importance it does hold for the original GeTfand Pinsker problem, and this may be giving us a hint 
as to why the capacity cannot be easily single-letterized. 

Remark 3. One immediate consequence of the capacity formula i26\} is that the non-causal capacity is 
a continuous function of the system parameters (R^sxX-s-XiP)- The continuity of the quantum capacity 
of a channel was listed as an open problem on the problem page of the ITP Hannover for about six 
years. The capacity was finally proven to be continuous by Leung and Smith in \3T\j . That the question of 
continuity itself is not a trivial one can be seen by taking a look at other capacities like e.g. the zero-error 
capacity (see e.g. i23\j for precise definitions in case of quantum channels), which is not continuous (see 
e.g. for that observation). 

Recently it has been demonstrated m that the capacity of arbitrarily varying quantum channels can 
be discontinuous as well. Especially this latter model is very close to the one treated here, with the 
only exception being that the sender has no information regarding the channel state sequence and, in 
addition, the choice of channel state cannot be assumed to follow a probabilistic law. In that model, it is 
usually assumed that the choice of channel state sequence is made by a third party that tries to prevent 
communication and is therefore called a jammer. 

VI Proofs 

Direct part of Theorem\^ Let psu G *P(S x U) be any probability distribution such that its one marginal 
satishes ps = p. Without loss of generality, p{s) > 0 for every s G S. Since U is free to choose we may 
as well assume that the other marginal pu satisfies pu{u) > 0 for every m G U. We may then define 
Ps|u(’l'*^) S *P(S) by Psju('S|u) := Psu(sj u)/pu(m) for all u, s G U, S. While the original channel is 
bksxX-s-K with output states ps^x, the use of g G with respective marginal distribution qsu = Psv 
and conditional distribution gxjsu defined by ( 7 x|su( 2 ;|s, u) := qsuxis, u, a;)/gsu(s, u) for every choice of 
s G S, M G U and a; G X defines a channel Wsxu^-x via the output states ps,u '■= X^xex 9x|su(2;|'S, u)ps,x 
and another channel defined by W\j^)c.(u) := X^ses come to our choice of code. 

Consider the probability of successful transmission of K messages over a random choice of K ■ M code 
words, each chosen independently and according to Pu(‘) ■= 1 t[/(-)I^g|~^, where Tu := {u" : N{u\u'^) = 
t{u)} for some type t such that t G fPQ(U). To any given e > 0 we can choose n large enough such 


that ||t — pu||i < e is assured if necessary. Choosing 2 • e < /3(pu) := niin„gupu(u) we additionally get 
t{u) > P{pjj)/2 for all large enough n S N and it G U. 

More precisely, a code C is a set {u]^rn}k ’m=i *- code words, to which we associate a POVM 

G for the decoder. The exact choice of POVM will be explained later. 

The code is chosen at random, with the underlying distribution given by P(C) = lit 
additional feature then is that the encoder only uses those words which are jointly typical with the 
channel state s”. Of course, in order to specify “joint typicality” we need to introduce the parameter 
i5 > 0 which will remain fixed for the remainder of the discussion, so that we can spare one index. To 
any given choice u" G Tjj we set 

M{u"') := {s" : maxt(ii) • D{t{u)~^N{■, uls"^, u'^)\\ps{-\u)) < <5/2}. (27) 

It may in principle be possible that this set is empty. A code word is only used at the encoder if the 
state sequence chosen by the Jammer satisfies s" G M(u”). For a given collection let us set 

iG(m, s") := {fc : s” G M{ukm)}- Roughly speaking, this ensures that code words always have a certain 
structure relative to the Jammer’s choice. The expected average success probability of a random code 
then is 


M 


Epsu:=^P(C)-^ ^ pTis-) 


m—1 


|iG(m, s")| 




(28) 


kGK{m,s'^) 


Here, given that the state sequence s" and the message to, the encoder encodes to into any of the code 
words Ukm G K(m,s'^) with equal probability. The index k of the code word is not decoded by the 
receiver. 

The use of such code words generates sequences at the output of the channel which look (up to small 

deviations) as if they were randomly drawn according to pfu- Thus on average, the decoder gets the 
/ \ <8)n 

state (I]«,sPsu(s, w)p„,sj 

It remains to define the POVM D(C). 

We let {ei}f^i be an orthonormal basis in which p := s Psu(s, u)ps,u is diagonal. From now on, it 
is understood that the vector spaces V/ defined in ([ 8 ]) are defined using that basis. Let Pf denote the 
projections onto these vector spaces. Clearly, it holds that Pf = J2f \ -P/.a, where Pf^x are the projections 
onto Vf^x and tr{Pf,xPf'.x'} = 0 whenever (/, A) ^ (/', A'). For every to G N, u G U and > 0 we now 
set 


■■= {(/,A) : D{f\\r,J < 6', D{X\KJ < V, A G Y,.^, /g qj™([d])} . (29) 

The quantities rp^ and fp^ got introduced at the end of the notations section. Take any ordering of 
U, such that we can without loss of generality write U = {!,..., |U|}. For each G U", let t £ Sn 
be a permutation which achieves t(u^) = (1, 1 ,..., 1 , 2 , 2 ,..., 2 ,...) e.g. r orders the symbols in it" in 
increasing order. We then write /C„ := and define P(m|u”) G S(/C„) and P(u”) G S(/C®") via 


P{u\un := ^ Pf.x 



(30) 

(31) 
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where the action of r on /C®” is the standard action of the symmetric group. These projections satisfy, 
for every u G U: 


tr{P(u|M’")pf(“)} = 1- 

^ tr{P/,Apf(“)} 

(32) 




> 1 - 

max{tr{P;pf(“)},tr{PApf(“)}} 

(33) 




> 1 - 

{2df" max 2 *(“)-d(^II’-'') J 

(34) 




> 1 - 

(2-t(M))^"2"*(“^''^'^ 

(35) 

> 1 - 

(2-n)‘^"2-"^ 

(36) 


Here, the first inequality follows from two observations: first, with P\ := Pf,x have P/,a = PfP\ = 
P\-Pf (for all / such that / G tPg ([d]) and A G '^d,n)- This can for example be seen from the construction 
of Young symmetrizers in m Chapter 5.5]. Second, if two projections Q and Q' satisfy Q ■ Q' = Q' ■ Q, 
then for every Y > 0 we have 

iv{XQQ'} = tr{gYQQ'} < tr{QXQ} = tr{YQ} (37) 

and, at the same time, 

tr{XQQ'} = tr{Q'XQ'Q} < ir{Q'XQ'} = tr{YQ'}. (38) 

The second inequality arises as follows: First observe that tr{P/p®*^“^} = tr{P/} 
then combine this observation with the upper bound in Lemma 2.3 of [50] (see equation (1101) 1 and the 
definition of the relative entropy. Second, observe that for an arbitrary A G and cr G 5(C'^) it holds 

tr{PAO'®*(“^} < (2 • t{u))'P ■ (see for example [171 Theorem 4] and references therein). 

The third inequality is simple type counting and the fourth uses the definition of g ,. 

It follows that 


tr{P(u")p„n} > 1 — |U| • max(2 • n)*^ -2 

_ 2^ _ log(|U|^.n)) 

> 1 - 

if only n is large enough. For message transmission over a known memoryless channel, this estimate 
would already be completely sufficient. However in our case the receiver is kept ignorant about the 
choice s" of the Jammer, which is only revealed to the encoder. Thus the encoder will try to ensure 
that iV(-|s",'u") Ri Pus- Let us now for the moment consider an arbitrary pair (s”,m”). We investigate 
the stability of the estimate (I4l1) under small variations in the following sense: For every u G U, define 
p{-\u) G fPQ“^(S) by fixing for each s G S and u G U the numbersp(s|u) viap(s|u) := N{s,u\s^,u'^)/t{u). 
Then 


(39) 

(40) 

(41) 


tr{ps^,u-'P{u'^)} = tr 


E 



Ps*M,uPiu\u^) 


(42) 


and equality holds since all our POVMs are permutation-invariant on those blocks /C„ where w" is 


10 



constant. But whenever an operator P £ B{lCu) is invariant on a block where vP is constant we get 


tr{p,tM = tr < ^ 


1 


\Tp{-\u) 


' Pgt(u) 


,P 


\Tp(.\u) \ 

1 


tr 




< • tr{p®"P}, 


(43) 

(44) 

(45) 

(46) 


where the last inequality follows from the upper bound on T'p(.iu) in Lemma 2.3 of [20] (see equation (fTOll 'l. 
Comparing with our previous estimate (1551) and using, for the moment, the notation u := (u,.. .,u) £ 
U*(“) 

this allows us to deduce that 


tr{p^t(„) uP«} = 1 - ^ tr{pst(„)_„P(u|M")} 

> 1-(2n)l®l2*(“)-°(P(-l“)IIJ’®(-l“» tr{p®"P/.A} 

> 1 — 


= 1 — 


(47) 

(48) 

(49) 

(50) 


and ultimately leads, for all s" £ M(it”) (which then satisfy max„ gut(M)£’(^^(-|s‘^“^lbs(-|w)) < 1 ^/ 2 ) to 

tr{p«n,„„P(u")} > 1 - |U|(2ri)l®l+'^'2'‘('5/2-^) (51) 

> 1 _ 2-"V4^ (52) 


for all large enough n £ N. 

We now continue with the definition of our POVM: we identify any given collection {ukm)^'m=i code 

words with the code C (e.g. we use C as a shorthand for iukTn)k'm=i) which arises from using the following 
POVM: For any k,m use the abbreviation Pkm '■= P{ukm)- We define 


and 


Pm := 


K 

E 

fc=i 


Pkn 


Dm := 




(53) 


(54) 


Through application of the Hayashi-Nagaoka bound {S + T) ^/^5'(5' + T) > 25 — 1 — 4T to the Dm 
we get 


K K 

> 2^Pfcm - 1-4^ ^ Pkm'- 

k—1 k—1 m'^m 


Recall from equations (1551) and (EZj) that 

MlvP) := {s" : maxt(it) • D{t{u)~^N{■, u\s'^, u'^)\\ps{-\u)) < 5/2}, 

U^\J 

iF(m,s"):={fc:s”£M(Mfcm)}. 


(55) 


(56) 

(57) 
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For a given random choice {u'^rn)^'m=i codewords, it will be necessary to see whether for a given m 
the set K{m,s^) is empty or not. It will also turn out that the specific choice of m is only of minor 
importance. We therefore define 


T{s'^) := {u",... : s" G M{u^) for at least one k G [K]}. 


(58) 


With this and the previously obtained estimates such as (ICT) we can then lower bound the expected 
average success probability: 


M 


' M ^ ^ ^ s”)| 

1=1 S"GS" fegK(m.s") ' ^ 


Eps„ = ^P(C)-^ E E 


tr{ps",iifc„-D„i(C)} 


(59) 


M 


p®n(^n) 

kGK{Tn,s-^) ' ^ 

K 


K 


2Ep(c)i7E e e 


’ ^km 1 4 E E Pk'n,')} (60) 


_ _ _ _ ri<S’n(^n\ 

> E E E nbiyr ■ 


M 


-'‘E«‘P)jjE E !■*”<=”) E 


1 


k' — l m'^m 


K 


m—1 s^'GS’" 


kGK{m,s^) ^ ' k' = lm'^m 


K 


>p®"(T;:,)^mm_ ^ n^u(u^i) ^ 


1 




fceK(i,s") 


|iG(l,s")| 


• (1 - 2 -”''^/'‘) 


M K 

-'‘EKGiffE E »>“(»") E E E p*-} 


m—1 


k£K{m,s'^) 


k' = l m'^m 


K 


> p®"(T;(,)^ mm ^ n ^u«i)lT(«n)«i,.. ■ (1 - 2—^/4) 


M 


-4Ei’(Gj^E E !■*”<=”) E 

C m— kGK{m,s^) 

= P®"(T; i) min P(r(s’")) • (1 - 2-”-'5/4) 
s"eT"^ 






E E Pk'm '} 

k'= 1 m'^m 


(61) 

(62) 

(63) 

(64) 

(65) 

( 66 ) 

(67) 


M K 

J: E ^ ^ ( 68 ) 

C m— kGK{m,s^) I ^ ’ ''I k' = lm'^m 

for all large enough n. Here the first inequality is a consequence of the Hayashi-Nagaoka bound. The 
second follows from the definition of iF(m, s”) together with estimate (I^Tl) . From there until the last 
inequality we just keep rewriting the hrst term of the sum until it fits Lemma [7] in the appendix, which 
we will apply later in equation (I8ip . 

We now start investigating the second term in the sum that lower bounds Eppsu. Since all code words 
are drawn independently we only need to consider the term with to = 1 in the following. Consequently, 
our next goal is to give an upper bound on 


E p((«^-)f=E=2) E E 


1 


K M 


{'^km)k = i^rn=2 




k€K{l,s^) 


|iG(l,s")| 


EE Pk'm}- (69) 


k' — l m—2 


Due to the i.i.d. choice of codewords, the above quantity can be written as 


^ p((u.i)f=i) E ^^®”("”) E 


1 






kGK{l,s^) 






(70) 
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with the average operator 


K M 


K M 


E nn Puiukra) E E 


k—1 m—2 


= K-{M-l)- Y. 


1 


u^GTu 


\Tu\ 


P(w-) 




u”-^Tu 


(71) 

(72) 

(73) 


It is readily seen from this formulae that the only important calculation to be done is the following. For 
a u" G Tfj and s" G M{u^), calculate tr{psn A very nice property of the POVM we utilize here 

is that A is permutation-invariant. A very delicate property of our POVM is its instability with respect 
to the output states. We have to make sure that every of our Ps^,u^ looks, on average over S'„, like p®" 
- otherwise we stand no chance of getting the quantum relative entropy into the game. We achieve our 
goal as follows: Set N{-) := V(-|s"’,u"), and define T^v := {(u”,t") : V(-|z;",t") = N{-)}. Then 


tr{ps",u"A} = 


\Tn\ 


tr < Y Pv^,t^A 

{v”-,t^)GTN 








PsuAN) 


< 


PiuiTN) 






/ \ <S)n 


plu(r^) 


tr{p®”A}, 


which is already very nice. We can now estimate that, for large enough n G N, 
Epsu > P(Vs” G G [K] : K{k, s") ^ 0) • p^^iTp^s) ■ (1 - 2-^'^/^) 

-4 ^ n{uki)Li) Y E 


> 1 - - 4 ■ 2 -" '5/4 . tr{p®"A}. 


|i^(l,s")lp|S(T(„.,..™)) 


tr{p®"A} 


(74) 

(75) 

(76) 

(77) 

(78) 

(79) 

(80) 
(81) 

(82) 


Here, the second inequality follows from Lemma [3 from fact (fT51) and from the fact that u" G A'(l,s") 
implies (with the help of [20l Lemma 2.3] (see the inequalities in (IT0l) l 


pfu(T/v( |s" M")) > iogpsu(s,M)) 

> ( 2 n)l®^’^l 2 “”''*/^ 

^ 2^ — n-Sj4: 


(83) 

(84) 

(85) 

( 86 ) 
(87) 
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if only n is large enough. The use of Lemma [7] does of course necessitate that ^ \og{K) > I{U] S) + 3iy{S). 
It remains to calculate A, a calculation that will make us employ some results from representation theory 
that were developed in m- The goal will be to show that, within small deviations, we have 

tr{;0®”A} (88) 

Together with the preceding calculations, this will prove our capacity result. 


The different code words used by the encoder are taken out of U” according to and chosen 
with equal probability on each of the sets Tn- We now want to estimate the symmetrized version of 
P{u^), more specifically the quantity tr{p®"^ TP„nr“^}. This is rather easy - since p®" is 

already invariant under permutations we get 


tr{p®" =tT{p^^P(u^)} 

= n E tr{p®‘(“)Py,4 


reSn 




< rr max tr{p®‘(“)P/A} 


(89) 

(90) 

(91) 




<T\{2t{u)f max t(u) 


«eu 


< (271)"*“ TT max 


(92) 

(93) 


< 


(2n)"*“ (94) 

tiGU 

= (277)-^“ n (95) 


ugU 


where 7 := max^gj^] | log(rp( 7 ))|. We additionally set ui := max„gu D{pu\\p)^ in order to get the estimate 


tr{p®” J2 -^rPiu^)T-^} < 


(96) 


res„ 


^ ^n-(x(pu,Wu^,c) + ^ log(n) + |U|(-g^(7+<^)-;g^ 

_ 2"-(x(pu,M7u-oc)+k('5))^ 

which holds for all large enough n and with the obvious but not unambiguous definition of k which 
ensures that lim^-^o k{S) = 0 (note that /3(pu) := min{pu(w) : Pu(w) > 0}). Overall, this leads to 

Epsu > 1 - 2-”-'^/^ - 4 • 2-”'^/"‘ ■ K ■ M ■ 2 -"-(x(pu.iVu^ac)+«( 0 )^ (IOQ) 

and therefore asymptotically reliable communication is possible (on average over all codebooks) whenever 


1 


log(77 • M) < x(pu, Wu^Ac) - k(<5) - 5/d 

-\og{K)>I(U]S)+iv{5) 

n 
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( 101 ) 

( 102 ) 











and 6 is so small that I{U; S) < v{5). Thus under above preliminaries we know that for every e > 0 there 
has to exist at least one sequence of code words such that the corresponding code has 

asymptotically vanishing error and rate bounded by lim inf i log(M„) > x{P\J,Wu^K.)-IiU]S)-£. 
One may remove the randomness in the encoder if necessary. The proof now only works for distributions 
PSTJ for which pu is an empirical distribution. Thus, an additional step is to approximate an arbitrary psu 
by one for which pu £ ^Pq (U). Continuity of the Holevo-information then yields the desired result. □ 

Direct part of Theorem [7J This result follows trivially from the channel coding results for the discrete 
memoryless case without any additional state knowledge. For reader’s convenience, we demonstrate here 
how the use of our POVMs delivers an elegant and streamlined proof of the direct part of the channel 
coding theorem with successive decoders. To this end we employ the beautiful non-commutative union 
bound: 

Theorem 3 (Noncommutative union bound |56jl. For a sub normalized state a such that cr > 0 and 
tr{(T} < 1, and orthogonal projections Pi,. ■ ■ ,Pm, the following estimate holds true: 


M 


tr{cr} - tr{Pi • ... • PmctPm ■ ■■■■ Pi} < 2 




Pm)<j}. 


(103) 


A first version of the bound was published in [T] and then improved by Sen in [IS] , Wilde [SB] and Gao 
m- The work |46j of Sen showed for the first time how to use a (sequential) von-Neumann measurement 
at the decoder to achieve the capacity of a cq-channel. Sen’s original method of proof |1B] uses an approxi¬ 
mation step that transforms the channel first and then applies the sequential decoder to that channel, after 
which it is shown that this yields an asymptotically optimal result also for the original channel. We save 
these two approximation steps and proceed much more directly: Let (ps,a:)ses,a:ex define the cq-channel. 
Let a finite alphabet U be given, and a conditional distribution (u(-|s, u))sGS,iieu together with a distri¬ 
bution Pu G ^(U). Without loss of generality, Pipv) > 0. Define pu := Z)sgs “)Ps(s)pa;,s 

and p := X)«gu?'u(m)o’u. Let n G N and /3(pu) > |U| • 5 > 0. For every G U’^, let P{u^) be 
the projection as defined in (l30l) and Pfu^) := 1 — P{u^). Choose M £ N codewords u" £ U" i.i.d. 
according to p'jj, where 

PuK) ■■=pTi^n ■ • \pT(.T-^,s)\-^- (104) 


For every random choice of sequences (it", ..., u^) let these sequences together with the POVM defined by 
Di := P{uf) and for all m > 2 by := P'«)... P'(u;(,_i)P«)P'«_i)... P'«), m = 1,..., M, 
form the random choice of code /C„- In order to make clear that Pi,..., Dm forms a POVM we consider 
an arbitrary but fixed choice of (it()j)()f^i. We introduce the abbreviation P^ := P{u}yf) and P()j := 1 — Pm. 
It is easily seen that for all m > 3 it holds 


= P'l---PL-2Pm-lP'm-2- 

■■P[+P[---PL-lPmPL-l---P[ 

(105) 

— Pi • ■ • Pm-3Pm-2Pm-3 ■ 

■■Pi 

(106) 

= ■■ D'm-2- 


(107) 

for every m, 

Dm + -Dm — 

D'm-V 

(108) 
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It follows that 


M 




M-2 

Dm + D^_2 

(109) 

m—1 

M-3 

E! P^m-3 

(110) 

m—1 

(111) 

Di + D[ 

(112) 

Pi + {1-Pi) 

(113) 

1. 

(114) 


This implies that the operators Di,..., Dm constitute a (sub-normalized, but this causes no problem as 
adding the measurement operator Dq := 1 — J2m=i together with an arbitrary codeword can only 
decrease the average error of the code) POVM. The expected error over the random choice of code is 
upper bounded as 


Eerr(-) = 1 - ^ IfPuW)^ E 11 ^^nPu^} 


M 


<1+ E noiitoigE b 


M 


< — V 2 

- M ^ 





/m —1 


E n?'u«) E tr{P(u^)p„rj^} -I- tr{P'(u(),)p„n } 


(115) 

(116) 

(117) 


,fc=i 


M 


< — V 2 

- M ^ \ 

m—1 \ 


1 


m—1 


m—1 


_ <2—n5j2 


E n P'hi'O E + ^p'^(u^)%T{P'(u^)pur^} 


1 i—1 


fc=l 


M 


< — E 2 , 

- M ^ \ 

m—1 \ 


1 


m — 1 


m—1 


1 _ 2-’T’'5/2 


E n l^u«) E tr{P«)p®"} -h 2-" 


— n5j2 


(118) 

(119) 


1 i—1 


fc=l 


where we have used the non-commutative union bound, Jensen’s inequality and the estimate 
tr{P'{vD)pu^} < from inequality (IdTI) . It further follows by trivial modifications of the esti¬ 

mates ((Ml) that there exists a function k' : (0,1/2) —>■ ]R_|_ satisfying lim 5 _^oK^(J) = 0 such that for all 
large enough n (depending on J) we have 


Eerr(-) < 2 


1 


-M ■ 2“"(x(pu,Wsxx->x;oV')-k'((5) _|_ 2-'r>'S/'2^ 


1 _ 2 -’T’V 2 ■ 

and that clearly demonstrates that all rates below x(pu, hPsxX->-AC ° V) are achievable. 
Converse part of Theorem]^ Let (/C„)neN be a sequence of codes such that for all n € N 


^E E E e(x"|m,s”)tr{psn_,j,„P„} = 1 - e„ 




M, 


( 120 ) 

□ 

( 121 ) 


m=l 


2 .n^X” 


for some sequence (en)nGN of nonnegative numbers satisfying lim„^oo e„ = 0. Define the random variables 
(911„, 6”, X”, 911") taking values in Kn x S" x X” x if” via their distributions 


P((9J1„,6”,X”,9JI„) = (m, s",x”,to)) = p®"(s")—e(x”|m, s”)tr{Drf,ps 


‘}- 


( 122 ) 
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Fano’s inequality implies that for all large enough n £ N it holds 

( 123 ) 

From there we conclude (by noting that \og{Kn) = i/(91t„) that 

log(M„) </(97l„;an„)+n-e„-|X| (124) 

<x(Win;Q")+n-e„-|X| (125) 

where the ensemble under consideration is given by (]g-,X]x"eX" e(a;"|TO, s”)p®”(s")bFsn (a;"))()f^^ 

and we employed the Holevo bound. At this point, it is convenient to write the Holevo information in 
terms of the quantum mutual information: Let the overall state of the system be 

^ X] X • V'm ® 0 e(a:"|m,s”)V'a:" ® 0 tr{i:)mPs".a:'*}'*/'mi (126) 

m,7hG[Mn] ^ 

where for convenience we embedded the classical variables into quantum systems by using orthogonal 
rank-one projections tjji (meaning e.g. that each i/jm £ satisfies 1 > tpm > 0, trj^/^m} = 1, 

V’m = V'm s-nd '0m = 0m)- This notation allows us to write the Holevo information as a standard 
quantum mutual information, a fact that we utilize in order to keep track of the dependencies between 
the various systems and subsystems that show up during our proof. As a first step, let us write 


log(M„)</(ajl„;g")+n-e„-|X| 

(127) 

71 

= ^/(fH„;g,|g*-i)+n-e„-|X| 

(128) 

n 

<^/(fH„,g*-0g0 + n-e„-|X|. 

2 = 1 

(129) 


Here the last inequality follows from S{AB) < S{A)-\-S{B) (subadditivity of von Neumann entropy) and 
the equality by definition of conditional quantum mutual information as /(A; B\C) := S{AC) + S{BC) — 
S{ABC) + S{C) and a telescope sum argument. We continue with our upper bound by noting that 
quantum mutual information obeys the data processing inequality [551 Corollary 11.9.4], which allows us 
to loosen our bound as 


^ /(aji„, g*-0 6*-0 g,) + n • e„ • |x| 

(130) 

n 

y; /(9n„, g*-0 s*-\ r-0 g,) + n • e„ • |x|. 

2=1 

(131) 


At this point, it is possible to use the structure of causal codes in order to relief us from the problematic 
term Q^~^. For every i G [n], write 


/(9n„,Q*-0s*-0r-0Q0 


5(®1„, + S{Q,) - SiWln, (132) 


5(911„, 6*-0 r-1) + |91l„, 6*-0 r-1) + 5(Q0 (133) 

/(9n„, 6*-0 V-0 Q0 + 5(Q*-1 |®l„, (134) 

/(97i„, 6*-0 r-0 Q0 + ( 135 ) 

- 6*-0 r-1) + 5(Q,|an„, 6*-0 r-1) 

/(97l„,6*-0r-0g0. (136) 
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Most of the above equalities follow trivially from the definition of relative entropy. Even the last one 
holds for a non-causal encoder as well. 

It is still worth noting that the system Q^~^Qi is in a product state given the classical data (m, 

That this is so is a consequence of the fact that causality is respected at the encoder. More precisely, it 
holds by definition of the encoder that 


for an appropriately defined e(m, s*) £ ^(X). Therefore, the system Q® ^Qi has the following state given 

( to , 


I X! X! ^:Si))p{Si)ps,^Xi 

VsiGSxiGX ) 

We thus get the upper bound 

n 

log(X„) < ^/(1H„, + n • e„ • |X|, 

and setting Ui := (9Jl„, 6 ®“^) this can be written as 

n 

log(i£„) < ^ /(17„ r-i; QO + n • e„ • |X|. 

i=l 

Of course this implies the existence of at least one i £ [n] such that 

- log(X„) < I{U,, r-i; QO + e„ • |X|. 
n 

The structure of the classical random variables involved here is such that 


( 138 ) 


( 139 ) 


( 140 ) 


( 141 ) 


P((17i,X* '^,Si,Xi) = {m,s^ ^,x^ '^,Si,Xi)) = - - ^ -^p(si)ei_i(x* \TO)e(a;i|TO, (s* ^,Si)), 


M 


( 142 ) 


and since X* ^ is only dependent on Ui here, it follows that 

-\og{Kn)<I{U,-Qi)+en-\X\. ( 143 ) 

n 

The distribution of [Si, Ui,Xi) is such that with an appropriate choice of p £ ‘^(U) where U := [M„] x 
we have for all m £ U, Si £ S and £ X that 


P((S'i, Ui, Xi) = {si, Ui, Xi)) = p{u)p{si)e{xi\u, Si) (144) 

such that the theorem is proven by taking the limit n —> oo and by noting that we can define a channel 
V £ Ch(U,S X X) by setting w(s,a;|M) := e(a;|s, m)p(s) for all s £ S, m £ U and a: £ X and that under 
this assumption and with the state under consideration having the form 

--EEE p(u)p(s)e(xlu, s)'i/’s Ps,x (145) 

which is clearly classical-quantum over the cut between U and the other systems we get 

/(t/dQ0 = 5(5^E E p{u)p{s)e{x\u,s)ps,x) - '^p{u)S(^'^p{s)e{x\u,s)ps,^) (146) 

s£S x^X sGS xGX 

= S(WsxX^K ° vm - E p{u)S{Wsx:x^koV{u)) ( 147 ) 

ueu 

= x(P) llsxx-i-x: o y)- 
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( 148 ) 

□ 



Converse part of Theorem\^ Let (/Cn)neN be a sequence of codes such that for all n £ N 


1 


iVln 

E E 


m—l 


^ e{x^\m,s^)tr{Ws^{x'^)Dm} = 1 - e 


( 149 ) 


for some sequence (en)nGN of nonnegative numbers satisfying limsup„_^oQ e„ = 0. Also, assume that 
logAr„ = R — £n for all n G N. Then, define the random variables (91t„, 6 ”, A",91ln) taking values in 
[Krf\ X S” X X" X [Kn\ via their distributions 


P((ajl„,6",X",9Jl„) = =p®-(s-) —e(x"|m,s")tr{Z4rf,lT,n(x'^)}. 


(150) 


Then by Fano’s inequality we have that, for all large enough n £ N, we get the upper bound 


H{dJln\mn) < e„ • |X|. 
From there we conclude (by noting that log(Arr!,) = iL(91t„) that 

/(»l„;911„)>log(/F„)-e„-|X|. 


(151) 


(152) 


Also, since Win and 6 " are independent, we get 

liMn-Mu) - imn; 6 ”) > log(i^„) - • |X|. (153) 


From the Holevo bound we can then conclude that, using a quantum system in the overall state 

^ E <8ie{x'^\m,s'^)tjj^n (Si (8itY{A^Psr^^^n}llj^ (154) 

m,rfiG[M„] " s"GS" 

where we remind the reader that ' 0 i •= l*)(*l is used as a shorthand for the orthogonal pure states 
corresponding to the realizations of certain random variables. While there is no strict necessity to do so, 
we use the standard embedding fp(A) 9 r J2aeA ’'(®)V'a in order to embed the overall state into a 
complete quantum system. We then have 


log(M„) < x(911„; Q") - /(97l„; 6 ") + e„ • |X| 

= xiPu; ) - /(il; 6 ") + e„ • |X|. 


(155) 

(156) 


Here, we simply set il := in order to make this bound look more familiar. We then define the set 

U := {a : a = ^ q{u,x'^\s'^) ^ p®”(s”) •'(/im (g) V's-® (157) 

uGU s"GS" 

and observe that the state au',S’^,Q’^ ■= Wu' S" Q" A" contained in U with the special choice 
g(M',i”|s”) := (1(5;"', x”(u', s")) • (1/M„). This produces (for all large enough n £ N) the upper bound 

log(M„) < max (x(pu; VFu^k:®") - A(il; 6")) + e„ • |X|. (158) 

PS"UX"e^n 

Clearly, the validity of such an upper bound produces a multi-letter converse. Since we have a single-letter 
direct part we can use the usual blocking arguments in order to match the upper bound. So, at least it 
seems that we have a complete coding result. □ 

Cardinality bounds and structure of optimizers. Let us first consider the case of causal information at the 
encoder. Assume that the optimization is carried out on an alphabet U' of size |U'| > |X| • |S| -|- 1. 
Observe that, since the encoding is given by stochastic matrices v{-\s,u), the following is true: If q' £ 
iP(U') together with some v is any solution to the optimization problem of Theorem [T] then it holds for 
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all s € S that the S-marginal of the solution psu'x defined via psu'x(s,u',x) := p(s)q'(u')v(xls,u') for 
all s G S, u' € U' and a; G X satishes 

Psiu'(slu') = p(s) > /3(p). (159) 

Now define for each a: G X and s G S a function ^ '■ {’>' S *P(S x X) : rs(s) > P{p) Vs G S} —>■ IR_|_ 
by fs,x(,'r) ■= pis) ■ r(s,x)/rs{s). The domain of each ^ is a convex and compact subset of tp(S x X). 
Then we see that inequality (11591) implies that for all s G S, u G U' and x G X 

fs,x{Psx\u'{-\u')) =p{s) ■Psx|u'(s,a:|u')/Ps|u'(s|M') =Psx\iJ'{s,x\u') (160) 

holds, a fact which we will need soon. Define / : {r G ip(S x X) : rs(s) > /3(p) Vs G S} —>• R+ by 
f{r) := SiJ2g^oofs,x{r)ps,x)- Let q' G ^(U') and psx|U' solve the optimization problem on U', meaning 
that 


max max 

qe'Pfu) yGChp(u,sxx) 


xih bTsxx->/c o V)) 


(161) 


= >5'( X] q' iu')psx\u' {s, x\u)ps,x) -'^q'{u')SC^psx\u'{s,x\u')ps,x) 


u' ,S,X 


S,X 


= S{y^^q'{u')'^fs,a:{p:x.\su'{-\u'))Ps,x) - y^9'(u')/(psx|u'0'))- (162) 




According to e.g. the proof of [31 Lemma 3] (which needs only compactness of the domain of the fs,x 
and of /), there exists a set U of cardinality bounded by |U| < |S| • (|X| — 1) + 2 (note here that for 
each s G S one of the fs,x does not have to be ’pinned’ here due to normalization) and a g G *P(U) and 
a conditional probability distribution psx|u such that for all s G S and a; G X it holds 


'^q{u')fs,x{psx\\j'{.-\u')) = ^ g(M)/s,x(psx|u(») (163) 

u' uGU 

{u')f{psx\\j'{-\u')) = ^ g(M)/(psx|u(»)- (164) 

u' uGU 

This implies that 

max max x{q,Wsxx^ic o V)) (165) 

(2eq3(U') veChp(u,sxx) 


= 'S'( V Q(u)fs,x(psxiu(-lu))Ps 




o(PSXIu(-lu))Ps,x)- 


This proves that (s,a;,it) i—>■ q(u)fs^x(psxiu(‘lu)) is a solution to the optimization problem (note that the 
marginal on S x U has to be of product form by definition of the optimization problem for the causal 
case) as well which additionally satishes the bound |IJ| < |S|(|X|-l) + 2. 


The case of non-causal state information can be treated completely similar to the one above: As¬ 
sume that n = 1 for the start. This time we do not have to ensure that (S, U) are independent so it is 
enough to dehne functions fx,s ■ *P(S x X) —>• ]R_|_ via fs,x{r) ■= r{s,x), f{r) := S{J2s x^i^^^)Ps,x) and 
g{r) := H{rs), then identical arguments produce the bound 

|U|<|S|-(|X| + 1). (166) 

Thus for every n G N it trivially holds that |U„| < |S|" • (2 • IXj)". □ 


VII Appendix 

Lemma 2 (C.f. [H]). Let p G V{X). For every n > jXj^, there is p' G Vq{X) sueh that 

lb-p'||i<^^ (167) 

and p{x) = 0 implies p'{x) = 0 for all x G X. 
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Proof of Lemma\^ Let n £ N be arbitrary. Set X' := {x £ X : p{x) > 0}. From the next lines it 
will follow that, without loss of generality, we may assume X = X'. For sake of simplicity, assume 
again without loss of generality that X = |A’|} and that pdA"!) > 1/|A’|. Choose for 

i = 1,...,\X\ — 1, such that \p'{i) — p{i)\ < Clearly, this is possible. Then necessarily p'(IT’D = 

1 - and 


la'I-i 

^ +\p'l\X\)-pi\X\)\ 

i—1 ^ 

(168) 


(169) 

^ + X, b(*) p(*)l 

/1 

(170) 

2\X\ 

(171) 


n 

Of course, while all the p'{i) > 0 by construction if z < \X\, this does not hold for p'{\X\). This is where 
we need the additional condition that n> \X\‘^: 


ix\-i 

p'{\x\) = 1- 

\ X \-1 


> 1 - ^ p{i) - 


IT-I-I 


>P(I^I)- 


lA-l 


> 


lA-l n 
> 0 . 


(172) 

(173) 

(174) 

(175) 

(176) 

□ 


Lemma 3. Let p £ P(5^(U), q £ ‘)3(S) and psu £ *P(S x U) be any distribution sueh that p\j = p and 
Ps = q- If d < ^P{pus) and n > 4 • |U| • max{|S|, 1 // 3 } then for every s" satisfying ||7V(-|s") — q\\ < d 
there exists it" £ Tp such that ||lV(-|s",it") — psu|| < 2(5. 

Proof. For sake of simplicity, let S = {1,...,5'}. Let /3 := /3(psu)- Let psu(s,u) = q{s)w{u\s) and 
||iV(-|s") — 9 II < d. Then for every u,s we have iV(s|s")i(;(it|s) > q{s)w{u\s) — d ■ w{u\s). It follows that 
I3{p'su) > Id — d. We may assume that q{s) > 0 for all s £ S, and that N{s) > 0 Vs £ S since otherwise it 
holds ( 7 ®"(JAr) = 0. Now, for each s = 1,..., 5 — 1, apply Lemma[2]to define a type N{s, •) on U which 
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satisfies ||7V(s, — w(-|s)|| < 77 ^- It then holds for every it G U that 

s-i 

|A^(S', u) — nq{S)w{u\S)\ = \N{u) — N{s, u) — ng(S')te(M|S')| (177) 

S = 1 

S 

< |-/V(u) — '^N{s)w{u\s) — ng(5')w(M|5')| + 2|U| (178) 

S = 1 

S 

< |A^('u) — nq{s)w{u\s) — nq{S)w{u\S)\ + 2|U| + n6 (179) 

= \N{u)-npuiu)\ + 2\V\+nd (180) 

= 2|U|+n5. (181) 

Therefore, we have that for all m G U 

N{S,u)>n{P-5)-2\V\. (182) 


Thus if (5 < /3/2 and n > 4|U|//1 the construction works. It remains to calculate the distance of N to 
Psu- 


||1V - psull = ^(■5)ll^(s> \N{S, u) - q{S)w{u\S)\ (183) 

+ ,184) 

n n 

< 25 (185) 

ifn>4|U|S|. □ 

Lemma 4 (C.f. [21] ). Let a" G and S" G S". There exists a function /c : N —>■ K+ such that with 
Ab being distributed as P((^,.B) = {a,b)) = ^N(a,b\aT '*^'6 have 

|{a” : Af(-|a”, 6”) = A^(-|a", 6”)}| = 2 ”'W^I^)- 7 c(")). (130) 


The function fc satisfies hm„_i.oo fc[n) = 0. 

The following Lemma is basically taken from [20]. It would generally be completely sufficient for 
proving all our statements in sufficient generality. 

Lemma 5. Let D{p\\q) < 5. For the function f\ : [0,1/2] —>■ R+ defined by fAx) := —■\/a;/21og(a;|Zp) 
we have that 


\H{p)-Hiq)\<fAS). (187) 

Clearly, lim^^o fA^) = 9. 

Note that p{x) = 0 implies p'{x\s) = 0 for all s G 5, by construction. 

Proof. From Pinsker’s inequality we have ||p — g||i < and, accordingly, by Lemma 2.7 in [20] . 
|i7(p)-iJ(g)| <-v^log(v^/|Z|). □ 

Lemma 6 . Let b be a positive number. Let Zi,..., Zl be i.i.d. random variables with values in [0, b] and 
expectation KZi = v, and let 0 < e < ^. Then 

^ [(1 ±£)^]| < 2 exp (^-L- , (188) 

where [(1 ± e)iy] denotes the interval [(1 — e)^, (1 + s)v\. 
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The proof can be found in [531 Theorem 1.1] and in [B]. 

Lemma 7. Let psu € x S) have marginal distributions pu G iPp (U) and as before p = ps- Let 

n G N and ^P{psv) > i5 > 0. Let s” G T”^ For a random i.i.d. choice of K elements Ui, . .., uk G Tp^J, 
each drawn according to we have: If K > and I{U;S) > iy(S) then 

P(Vs'‘ G 3 A: G [iP] : s" G M(ufc)) > 1 - exp(nlog(|S|) - (igg) 


This implies, for all large enough n, the weaker estimate 

P(Vs" G Tp” ,5 3 fc G [if] : s’" G M(ufe)) > 1 - 2-^'^/'^. (190) 


Remark 4. Recall that M{u'^) := {s” : maxagu p^(m)Z1( N{-,u\s^, u”'||ps(-|'a)) < '5/2}- 

Proof. Let s’" G Tp^s be given. According to Lemma |3] there exists m” G Tp^ such that ||iV(-|s”,u*") — 
PsuII < 25. It follows from Lemma [4] and Lemma [5] that for all large enough n G N we have 


\{u^ G Tp^ : lV(-|s",u’") = 7V(-|s’",it’")}| > 

^ 2”(-f^(£^l'S')+'51og(i5/|S|)-/<7(n) 
> 2n{H{U\S)-u{S)/2) 


(191) 

(192) 

(193) 


where i/(5) := 451og(5/|S|). Thus by elementary type bounds we have that for every s’" G Tp^s and all 
large enough n G N we have for M'{s'^) := {it’" : s’" G M(it’")} 

E1m'(s") > 2^^LU-,S)-u{S)) ^ Q 

Of course 1 m'(s")(w’’) < 1 for all it" G U". Thus it is an immediate consequence of Lemma| 6 ]that 


P(Vs’" G T^fs 3 fc G [AT] : s’" G M(ufc)) = P(Vs’" G 


K 


K 

E 

/c=l 


lM'(s")(Ufc) > 0) 


= 1 - P(3s’" G Tp"), : 1 ^ lM'(.")(ufe) = 0) 

1 ^ 

> 1 - P(3s’" G Tp"), : - ^ lM'(.")(ufe) < 

k^l 

1 ^ 

^l-ISI"" max P(—^ lM'(s")(ufc) < iElM'(s")) 

> 1 — jSI"" • exp(—i • K ■ min Elj\^/(sn)) 

> 1 - |Sr exp(-i • K ■ 2 ^(Lu-,s)-u{S)^ 

> l-exp(nlog(|S|)-2’"'‘’('5)). 


It follows that for all s’" G S’" there exists at least one k G [K] such that s’" G M(ufc). 


(195) 

(196) 

(197) 

(198) 

(199) 

( 200 ) 
( 201 ) 

□ 
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